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ABSTRACT 



\ To vhat degree testing can becoie adaptive is 
i!n three vays: froi a foritil methodological perspective.; 



considered iin thre^ vays 

( f ro« a, huian process, stability^ perspective;^ and fraa a sub-syst^a 
or component -viev vithin an adaptive Instructionar systes (AtS) • With 
the advent of large coiputer- based trailing systeis, the opportunity 
to broadly iipleaent adaptive testing lodels and cpntrast them in 
t^ris of their adaptive nature has come to its loient of truth. It^ 
therefore^ seeis appropriate to describe tariqus computer paradigms 
vhich are representative of one or more models. This completes the 
fi^st third of this paper. Testing has long been considered adaptive 
if* the situation is made easier or more relaxing for the student. As 
^thds papej illuminates, it is perhaps more Mportant to increase the 
challengin-g' aspects of the test adaptation, even to stressing 
characteristics in order to improve both reliability and validity. 
Adaptive testing can l^e considered .within the context of a total AIS 
framevork. To vha't degree does it provide for time savings and for 
enhanced systems improvement? It Is in this last area that so little 
experience and. data are available,. 0hat little data and conjecture 
that can be accumulated at this time is presented to complete the 
ov.erviev of^adaptive testing,. (Author/RC) 
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ADAPTIVE TESTING AS A SIGNIFICANT PROCESS IN AIM 

by 

Duncan N. Hansen 

1.0 Introduction , * ^ 

Generic to any adaptive instructicml system {AIS) is the testing- 
evaluation process. Given tlie goal of adapting tlie overall instruc- 
tional learning process, it seems only natural to ask, to i^at degree- 
can testing become adaptive? For tlie purposes of this paper, this 
question can be considered in three ways. First, from a formal methodo- 
logical perspective; second, from a human process, stability, perspec- 
. - tive; and third, from a sub- system or coin>Qnent view within an adaptive 
instructional system. 

In reference to tlie formal psycliometric models it has long been 
known tliat many test items (too liard or too easy) provide little or no 
information conceming the outcome decision to be made about the 
^ student. If this is tlie case, then it seems only natural to find some 

appropriate way for removing these test items without detracting from 
either the reliability or validity of the assessment instrument. The 
vast majority of adaptive testing models fonaally address only this 
problem. From a systems point of view, tliese models have received 
little or no eirpirical investigation. With tlie advent of large computer- 
based training systems , tlie qiportunity to broadly inpleraent adaptive 

'i testing models and contrast tliem in terms of their adaptive nature has 

* 

■* ' come to its moment of trutli. It, tiierefore, seems ^ropriate to 

'• 

describe various computer paradigms vdiidi are representative of one or 
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more^nodeyp. This will corrplete, tlie first third of this paper. 

As a student is presented with test items via either pencil and 
paper or some electronic device such ^ a CRT terminal, he is involved - 
in a corpplex beliavioral process . Tlie testing itself presents certain . 
kinds of diaracteristics. It lias long been considered adaptive if we 
can make a situation easier or more relaxing for a student. As tliis 
paper will try to illuninate, it is perliaps more inportant to increase 
tlie diallenging aspects of the test adaptation, even to stressing 
diaract eristics in order to iiqirove botli reliability and valid! tyv ' 
Thus , the very nature of adaptation as^ a behavioral process interacting 
witli a c^oiamic testing algoritlun may diange our tliouglits and views of 
tl>e environmental conditions for optimality. Fortunately, the indices 
of reliability and validity directly answer tliese issues. 

/ 

Finally*^, adaptive testing cai^ be considered within tlie context of 
a total AIS fi::amework. To what degree does it provide for time savings . 
and for enlianced systems improvement? It is in tiiis last area that we 
have so little experience and data, li/hat little data and conjecture 
that can be accumulated at this time ^vill be presented to conplete the 
overview o^^Hkutive testing. 

■ ^ 

2.0 Adapting Testing Models for fetructional .Sy^tens 

Adapting testing models (ATTl) • while interesting from a theoretical 
point of view are, in fact, only as inpop^ant as the overall adaptive 
instructional system (AIS) into whidi tliey are embedded. Recognizing 
that adaptive instruction is to be contrasted witli more conventional 
or individualized sqjpfoaches, each AIM approach tends to stress charac- 
teristics of (1) being adapted to the specific diaracteristics of each 
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student from both a class and state variable viewpoint; (2) to provide 
instruction in some systematically contiiigpnt fashion ; (3) to mediate 
the information flow so as to optimize the learning rate and outages, 
and (4) to provide empirical feedback, most importantly, to the system 
so as to allow it to approximate its ultimate state of optimality. As 
a framework for understanding the role of adaptive testing. Figure 1 - 
presents a flow of how an ad^tive * insttuctional system would work* 
For our testing purposes tlie critical areas are found in Step 1, Step 
8 and most importantly in Step 10« Allow me to elaborate^ 

First, the initial steps indicate how all of the a priori jafifor- 
mation on a given student is considered and tiien is matched within the 
consideration of tasks, instructional alternatives and the students' 
data profile. From tliis, an instructional decision rule, sometimes 
referred to as adaptive instructioijalrTfiodel, is selected and applied. 
This is sdieduled and tlie ^instruction is prescribed. After it has 
been inplemented it receives an irmediate evaluation. This evaluation 
both provides feedback to the student's learning profile as well as 
to the overall system as represented in the parameters found in the 
adaptive instructional models, Tlius, Step 1, the student's Ic^aming 
profile, is an update of liis immediate prior performance, his learning 
time, and otlier associated learning indices, as well as associated 
behavioral patterns, be tliey adaptive or personality in nature. 

The conpositioh of an instruct i<»ial prescription is critical in 
that this represents the point of closure |>y^\^jich^he^objectives and 
criterion level are fomilatetTfor a student. Tn Step 10, this infor- 
mation is utilized as entry information into the testing process. The 
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testing process tlien consists of tlie presentation under some appropriate 
algoritlim of a series of items vihidi !ire scored in real time, and a 
decision is made. Outcomes of tlie actual test performance are then 
utilized to update both the individual student reco!rd and the system. 
We tuvn now to tlie details of tliis adaptive testing process . 
* -. ^ * 
1^1 Test. Entry Processes , 

The testing process (Step 10) can be characterized by three sub- 
processes:, (a) appropriate test selection and student entry, (b) 
tailored presentations of tlie test items and (c) sensitive scoring . and 
diagnosis, interpretation and reporting. For the entry process, it is 
intuitively and enpirically obvious that tlie test or conposited test 
items should be selected to maximize tlie accuracy and meaningfiilness 
of th^ outcome decision. . In addition, a student should be entered into 
tlie test so as to minimize botli- trivial items and highly difficult or 
inqx)s;ible items viiile focusing on the presentation of those items 
that best reflect tlie student's current learning conpetencies and 
provide for ^ropriate discrimination among the altematives to be 
considered vdthin tiie testing* decisions. Therefore, any adaptive test 
selection and entry process would liave to be based on the student *s 
characteristics to be valid. 

Tlie researdi area of computer seltscted and/or conposed tests is 
practically nonexistent.. Wood (1971) reviewed the techniques for 
canputer-conposed tests. The Naval Qtfl project (1973) at Menphis 
illustrates how students can be routed to specific tests. Adaptive 
selection of tests remains a liighly promising topic for future 
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research. Rasch (1969) provides a model tliat yields equivalent indi- 
vidual measurement (scores) fran sets of items varying in difficulty, 
itasang (1972) proposed a procedure for item weighting to achieve 
invariance of test scores under varying test difficulty levels. Obvi- 
ously, a large storage capacity, general purpose .conputer allows for 
the coit?x)sition of tests in real tirne^ a near infinitive solution to 
the problem. • ' 

In turn, ad^tive entry of a student into a test arran^d in a 
difficulty hierarchy remains unexplored. Owen (1969) has developed a 
procedure for applying i^aycsian concepts to either the appropriate 

determination of a test or for the tailoring of te^t items to each 

• ^ 

student, the methodology being appropriate for eadx problem. The 
Bayesian models offer a number of distinct advantages: 

1. The step size of difficulty between tests can be of tlie 
examiner's choice. ' ^ 

2. /rhe choice of entry is dependent upon previously collected 
data on each student. 

3. The choice of a scoring method is less important and is 
primarily governed by tlie choice of a loss function selected by the " 
examiner. 

4. All of the test item parameters are permitted to.-vary. 
Unfortinately tliere lias been no enpirical findings to si4)port these 
views . - ^ 

TTie ad^tive entry of a student into a test arranged from a sheli 
hierarchy remains to be investigated. In a more integrated ins true- * 
tional and testing paradigm, Stqipcs (1968) has provided for individualized 
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entry for well over 50 ,000 students in a matliematics CAI drill and 
practice program. Hie results indicate tliat students can be given 
appropriate entry based on the single variable of grade level and 
find an appropriate perfomance level witliin a minimum of Qne hour 
of iAstruction. 

It ^iwld be observed tliat each of these programs utilized only 
one variable (grade level) for tlie predicted entry placement • If 
nultivariate regression techniques were utilized, it ivouid undoubtedly 
be true th^t a mudi more precise placement could be determined* It 
should be observed , tliough , tliat the evaluation of placement for 
adaptive testing will have to be determined in terins of th^ criterion 
of minimum nunfcdr of test item presentations, since the behavioral 
evaluation xs elusive at best, and perhaps impossible to answer in 
terms of student self -ratings. 

2.2 Tailored Presentati^i of Test Items 

After the student lias been placed in a test, the test item presen- 
tations should 1)e designed or tailored so as to match items to the 
current performance or ability level of students. Sinply stated, the 
student should always be presented vitli tliose items that best match 
Ms competencies as well as providing tlie greatest discriminations. 
As lie begins to fail, tlie test should be terminated as quickly as 
possible. As an ovejall point of view, testing sliould be minimized to 
tliat degree .viiicli minimizes tlie risk of error to an acceptable degree 
within tlie instructional system. In considering the number of tech- 
niques offered for tailoring tlie tests, reference will be made to a 



nunber of reviews by different grotps: Weiss (Weiss and.Betz, 1973) 
at Minnesota, Hansen (1973) at Meirpliis State University, etc. 

\4hile \ieiss argues actively for a two -stage testing model, serious 
considerations of a number of factors led .our group to consider the 
flexilevel nodel developed by Lord (1971). The flexilevel model starts 
a stulent with a middle difficulty item and proceeds by presenting the 
next easier item after each wrong response and tlie next harder item 
after eacli correct response. Testing is* stopped after n items >diere 
n is defined^as (N ♦ 1) and N is the total nunfcer of items of the test. 
Lord found through conpUter simulation studies that the flexilevel 
model yields liighly satisfactory results if the difficulty step size 
is in tlie range .033 to .067. Tliis model is quite advantageous for 
two reasons: first, the reduction in test items is clearly specifiable 
and po1:ential paper and pencil applications are also feasible. More- 
over, tlie test item pool can be directly inplemented from an existing 
conventional test, a liighly inportant developmental factor. 

Tlie various problems, rsdsed by tailored testing discussed above 
are sunmarized by Lord as follows: "Uitil now, even some very primi- 
tive questions about how to carry out tailored testing did not have 
even vague answers.** If these problems are confusing even to the . 
psychometricians , how can the educational sector have confidence in 
tailored nesting? A mature sunmaxy of problems and advantages 
indicates r the wisdom "of further research axKl development. 

In some of tlie studies reported (e.g. , Angoff and Huidleston, 

1958; Gleary,^ Linn, and Rock, 1968), as many as 20% of the students 

were misclassified by tlie routing test. In the caSe of conventional 
% 
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testing, misclassification of students is similarly unavoidably, since 

no training test of today is perfectly valid and reliable. Given 

equivalent vieakness for eadi approadi, the use of ijiproved t*t 

development methodology is the best course of action. 

Another serious weakness of tailored testing is that although it 

is better for tlie extreme ability groups, it provides less accurate 

measurement for the a\rerage individual tlian tliat of a ''standard" test. 

Lord gave tailored testing an apparent ''fatal blow*' in tlds comment: 

If, for example, 500 items were available for tailored 
testing, better measurement will often be obtained by 
selecting, for exanqple, the n « 60 most discriminating items 
(Iiighest a) and administering these as a conventianal test, 
ratiter thSi by using all 500 in a tailored- testing procedure. 
This, may actually prove to be a fatal 61)jectian to any general 
use of tailored testing . ^ 

Tliis remark would hold if tailored testing is applicable only to nor- 
mative ability measurement, such as the GRE, or the SAT. However, in 
reaction to this restricted viewpoint of tailored testing. Green (1970) 
argued tliat "the computer's failure to ijrf>rove on conventional testing 
in tliis situation does not foreclose tlie possibility of coirputer 
advantages in other cases." Very similar opinion was also shared by 
Crick (1972) who neacted: "Lord's restricted vievf of testing, while 
certainly a legitimate one, does not exhaust the possible fipplications 
of computer-assisted testing." 

In discussing the prospects of tailored testing, it seems that 
tl^e following points dre pertinent: 

1. One reascm for Lord's negative comment on tailored testing is 
tlie strategy of conparison with a standard test (i.e., a conventional 
peaked test) . However, in conparing the tailored testing with a * 



"pul)lishcd" (Lord's ckjfinltion of a conventional unpetkeJ test) test, 
Ms findings indicated tliat "tlie tailored procedure gives more accurate 
mcasurernent* than the unpeaiced conventional test for all 'studants^ 
regardless of leyel/* TIius, in m6sf"'lnsttactional contexts, tailored 
testing is apparently tlie most effective approach-. 
, 2, It lias also been sliown tliat tailored testing permits a>drastic 
redpction of test items without mucli loss in tlie reproducibility of 
the total test scores . ^ 

' 3. One novel ^plication was made by Ferguson (1969) who used 
tailored testing in a liierarchical criterion-referenced measurerabnt 
Situation. Concexning the potential usefulness of tailored testing \ 
'fbr tlus purpoi?e, Crick corwented: * 'Intuitively, tailored testing ^ 
makes much more sense for a criterion-jeferenced measure than for a 
norm- referenced measure isince tlie gpal of tailored testing is to . 
"adjust tlie test to the inciividual 

4. In individualized a^oproadies to instruction, it seems that 
Lord's flexilevel testing may liaye wide applicability, In^ the pretest, 
every subject would take tlic easy ^et of the items; but, in the post- 
test,, the subjects would take the dk^ficult set instead. Thus, the 
use of tlie parallel forms of the test can be avoided. Rirthennore, 
since tlie subjects would not have been e>qx>sed to nOviy of the harder 
items, tlie carryover effects of testing can be minimized.. Altliough 
Lord developed tlie flexilevel testing, lie h^ not ^nphasized tlie use 
of it in this context. 

5. Tailored testing is appropriate also in the affective domain 
of measurement. Tarn (1973, a study to be presented later) found tliat 
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a flexilevel mcxlel yielded reliability and validity- indices equivalent 
to the total conventional test, and an enplrically observed stop- 
criterion reduced the test length significantly beyond the SOI level.- . 

The prospects of tailored testing depend on williu^ess to explore 
its various uses, and the above list is by no means e^diaustive. It is 
hoped that more rigorous e^g^loraticms of tailored testing will lead to 
Green's prediction ^f^tlie "inevital>le COTiputer conquest of testing, ' 

2>5 Scoring axKi Reporting Procedures 

The scoring procedures (right/wrong, average difficulty indices, 
an average of correct item difficulty indices, etc.) the diagnostic 
interpretation, and the report (quantitative and/or verbal) should be 
sensitive to all information obtained from the test conpleted by tlie > 
student. For example, a bright student wiio is liaving a bad day ^ould 
be differentially treated from tlie marginal student who is all but * 
failing in the course. Eacli factor in tliis tliree -process representa- 
tion of adaptive testing should reflect botli individual student data 
and the requirements of the training system so as to maximize the 
student's learning ratesi'^and mastery performance as wll as the * 
efficiency of tlie training system. » 

For this higlily importailt third process of adaptive testing, 
limited research findings (theoretical, simulated, or empirical) have 
been refported. The reviews above subsumed tlie preponderance of work 
to date. Therefore, this sectioa. will focus on prjonising topics of 
further study, , ' 

f'bst scoring procedures utilized tlie^dichotomous right -wrong 
summed sc6re. Three promising alternatives appear to be feasible. 



First, one could differentially wciglit items so tliat tlie most discrimi- 
nating items relative to tl« criterion decision zone rather thanjtlie | 
total score have tlie most decisive influence. Sttidies of item weight | 
indicate vieighting can iirprove decision; maloixg as \/ell as test*psycho^ 

V 

metric characteristics.. Thus alternative weighted scoring procedures 
are premising and feasible given a coirputer's caloilation ciq[>acity. 

In turn, the aggregating or summation process for total score 
should be studied; Green (1970> posits that a mean of difficulty 
indices for/<iorrect respcmses offers the most accurate procedure. 
Similar/axiqx>site score procedures ^at stress miriimally acceptable 
mastiwy levels sliould be investigated. 

Finally, there is important information in tlie error responses 
elicited from 'students. Bock {1972) proposes an item estimation 
procedure that yields differential information from error alternatives. 
Intuitively, a **nearly correct*' response is more ads^tive than a . 
"dum-dum"' response. In turn, tliese error patterns may yield highly 
important differential categories of students \A\o have partial know- 
ledge. For one group, the remedial alternative of test item review 
would be sufficient to achieve mastery while the other extreme group * 
may acliieve mastery only through totally new training strategy. Large 
student flow and a computer are required to ijTq[)lement the Dock model 
^ In terms of diagnostic requirements , total test scores and item 
pass fail indices arc far tab suimarized for instructional inference * 
making, ..Measurement withii/ instruction sliould yield an individuaT 
^^^.^-performance profile that i/idicates the structure and "valley" of weak* 
• ness. Profile techniques could yield insights like "the verbal indices 
^ < are so low that only a high multimedia with audio training ^qpproach 
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insure mastery," or tlie '^uniform pattern of indices indicates * 
tliat ijicentives to enliancc nfotivation will insure fast mastery." 
Vlhile speculative in nature, tlie individual performance profiles 
interface directly into ai> adiq)tive instructional model at this 
operational juncture. ^ * ' '* * 

Interprelajiion of adaptive tests c^n be viewed as an /''actuarial 
to clinical" challenfe^^ As sufficient test'data'^ases are collected, 
refined classification teJmiques (discriminant analysis) and 
.statistical decision models can be constructed so as to inprove the 
predictive aspects of tlie interpretation, \Vhile a futuristic form of 
research, the ultimate requirement should be investigated so as to 
have the full potential^ ad^tive trainiiig (instruction and testing) 
achieved. 

In regard to reports, the recurrent pix)blem of imderstanding 
numerical oY statistical outputs by instructors, supervisors, etc^. , 
are still present. Graphical and verbal reports should be oonsidered 
and studied. * The sufficiency of information for instructional decision 
making and monitoring is critical. As cited in the Hansen, Hedl, and 
O'Neil (1971) review, automation of the report process is both feasible 
and desirable in terms of cost- and 'resource utilization. A*consumer 
survey methodology could be profitably employed at this stage. 
Obviously, adaptive tests will only be useful to the degree that their 
results are utilized in a sound, rational manner. 

From a modeling viewpoint, the .need for enpirical researdi far 
outdistances our ability to gpnerate ideas or jpsychometric models. We 
turn now to a computer based paradigm for inplementing this approach. 
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2.5 Flexilevel Mastery Test , . ' . 

Ifeing tlie concept of a three-phased adaptive testing process , an 
i^jproaich to adaptive mastery testing has been developed in both tutor 
iangMage for the Plato system at the Ihiversity of Illinois as well as 
the Sigjna 9 syste^J^ at Memphis State Uiiversity, Fran a student point 
of view, the procedure niis as follows: the student (a) signs on the 
computeTpteTminal , (b) enters control processing, (c) thp system 
selects the test and entry level for Jiim, and (d) executes the adjusted 
flexilevel item presentation vMch will assess his performance* After 
he has conpleted tte .adaptive portion of the test,, all remaining items 
are^pre^tfed. If he has demonstrated an acceptable level of perfor- 
mance, tlie system then decides whetlier to (a) assign the next flexilevel 
test, reenter the student in control processings^ and ^ once again begin 
the flexilevel sequence^ or (b) sign liim off, an opticm available to 
the instructor for acceleration. Figure 2 presents a flowchart of a 
stiident moving tlirough each of these answers. A more detailed 
description follows. 

In signing on, the student enters liis name and the computer 
executes a security dieck designed to limit system accessibility and 
assure test security. Chce he has conpleted the required sign-qn 
activities , the computer system checks his performance record and 
aptitude profile to determine viiidi of the tests he is rei^ for. The 

. - , i 

system also detemines his entry level in the chosen ^test\ Thus, tlie 
^ • I 

student is prdvided* the most timely entry test point in terms of his 
recorded perfomance , ^titudes, and'pirrent in -course Status. 
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Figure 2. Flowchart of student, progress through flexi level test- 
Ing program. . 
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student readiness indices would include previous instructional,., 
activities, courses completed, formal education, and other objective 
learning indices* llis aptitude profile mij^it include his test scores 
on tlie standardized college entrance tests « His current instructional 
status identifies how far along he iias come in the. course. Together 
tliese data enable control processing to almost instantaneously a»ipute 
a predictor equation based on these variables * 

Once the predictor equation is determined, the conputer system 
translates it into an appropriate flexilevel test \Aose difficulty and 
scope are adjusted to the student's predictec^ performance. He is 
therefore provided an evaluation experience individually tailored to 
his current status. He executes this test on the computer terminal, 
a useful mediun not only because of its j^id response but also because 
of its. transitory display, v^iich augments- test security. 

The student enters the test at the difficulty level that has been 
predicted appropriate. If he misses an item, he continues down the 
difficulty scale until he gets one correct. This establishes his. 
in-test performance base, from \Mch subsequent flexilevel items 
originate. 

t 

IVIien he lias completed tiie adjusted flexilevel test, the remaining 
test items are presented and tiie student responses are evaluated (see 
Figure 1). Green's scoring procedure will be used to evaluate tlie 
flexilevel portion of tiie test, \Mle the entire test will be evaluated 
U9ing performance criterion scoring procedures. Thus, for each student 
a tailorecl test score and a conventional tc^t score will be available. 
If full mastery, based on the entire test score, is achieved, the 
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student is provided tliD opporturiily to take the next lesson. If he 
elects to, lie then reenters control processing and begins the same 
sequence in the next assigned flexilevel test. 

In the case of test failure, tlie student goes offline for course 
remedial activities keyed ta his learning deficiencies. Following 
reinediation, all students reenter control processing and restart the 
flexilevel testing cycle. 

After the student attains performance mastery, as a result of 
either tlie initial or postremedial test score, tlie system then decides 
if he should continue to the next test. If time permits. He most likely 
will be Routed to control processing for a performance prediction 
update aiid subsequent testing. If further testing is not prescribed, 
lie is '§:^gned off. 

Otliei paradigms liave been inplemented. Weiss (1974) and his 
colleagues have a two stage fortran based program. Ferguson (1969) 

V 

'to ^ 

and ISU nave elaborate liierarchickl skills test paradigms.- The Bock 
procedure fo:l^^critical zone analysis has been iirplemented at ^fiU. We 
turn now to some empiilcal' results that substantiate these niDdels and 
con|>uter^:paradigns . 

3.0 Adaptive Processes and Validation 

As was presented in tHe introduction, Al>Vs should allow not only 
for systems ad^ticm but jalso for sigiiificant beliavioral adaptation 
for tlie student. As indicated befqrel, tlie amount of enpiriqal work to 
assess tliis adaptation .esrpeciaHy from a reliability and validity 
point, has been exceedingly limited. There is now sufficient starts i. 
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made in this endeavor to indicate wliat some of tlie jlikely trends appear 
to be, namely, ATO's provide equivalent or slightlyj improved reliability 
and validity measures. This tentative finding appears to hold for 
asynrnetrical score distribution as found in criterion or mastery 
testing. 

C ^ 3.1 Itedl Study of Intelligence Testing 

The first study vMch. examined a conputer bas^ adaptive test was 
perfoimed by Itedl (1971). The Slasson Intelligends Test (CB-SIT) was 
designed to cfperate vdtli an IBM 1500 conputer instructional system. 
The test items were presented individually as cx>mmonly found in all 
individualized intelligence testing via a CRT terminal. Students enter 
tln^ir answers for immediate conputer evaluation i4uch was based on 
various key word answer algorithms . For tlie reliability and validity 
study, 48 undergra^atc students were individually tested with the 
miC, SIT, and the CB-SIT. As can be seen in Table 1, the modified 
split lialf reliability correlations for the corputer based intelligence 
tests were lower but essentially equivalent to that of the human 
administered tests. Table 2 presents indices concerning the concurrent 
validity vAich yield moderately strong concurrent rela^tionships. Per- 
haps mdst iiipressively, Table 3 presents the multiple regression analysis 
on grade point average; and surprisingly, tlie conputer based test proved 
to be the superior predicator. As indicated in Table 4 computer based 
testing led to significantly lieightened anxiety as well as a decrease 
in the positive attitude toward the testing. Thus, /one can interpret 
tliis finding as indicating that conputer based ^adiq:itive testing may 

' ' ' . /" 

-I ^ 

2p * • 




Table 1 

Modified Split-Half Relit^bilicy Coefficient! (Hedl Study) 



CB-SIT 


cIb-sit* 


Six 


SIT 


Total Group (N - 48) 


.66 


.79 


.79 




.88 



* Adjusted for cett length 



Table 2 

Coefficients of Correlation (Concurrent Validity) Hedl Study 



CB-SIT 


SIT 


WAIS 
VIQ 


PIQ ' 


FS-IQ- 


.75 


.55 


. .32 


.54 






Table 3 










Multiple Regression Analyses 


(Hedl Study) with CPA 




R 


r2 


CPA - -.57 


-.66 (Sex) + .01 (CB-SIT) 




.66 


. .44 


CPA - .40 


-.65 (Sex) + .02 (WAIS) 




.56 


.32 



Table 4 

Means For STAI A-State Scores and Attitiide Scores 



CB-SIT 
Pre Post 



SIT 

Pre Post 



WAIS 
Pre Post 



ERIC 



-Anxiety Means 
Attitude Means 



10.7 
73.2 



12.1 
66.9 



9.5 9.2 
70.0 71.5 



9.1 
70.9 



9.8 
75.5 
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increase tlie stress and, consequently, anxiety reactions toinrard the 
assessment situation. Tliere is a reasonably, consistent pattern for 
improved validity and acceptable levels of reliability. 

, ^ * 

3.2 HSU Study of Adaptive ffastery Testing 

Our group at tfempliis State Uiiversi^ has inqplemented the cqnq;>uter . 
I>aradi0n of adaptive testing described in Section 2. 5. As a follow up 
to this» a quasi -individualized modularly ad^qsted course in beginning 
graduate level statistics and researdi metliodology was utilized as the 
context for assessing tlie reliability and validity of con|)uter based 
adiq^tive testing as opposed to conventional paper and pencil testing. 
Utilizing two different groups p tlie students were presented with a 
paper and pencil version of the test and a con9)uter version presented 
over either a CRT or teletype terndnal . Varying predictor variables 
were uscfd for the entry techniques ; tlie students • grade point avetage 
and their running average scores on modules were the main determiners . 
As presented in Table 5 ^ the mean performance on either version of the 
test tended to be oasyirmetrical in nature with tlxis being more pronounced 
for the module test \Mch can he thouglit of as complex multi- lesson 
test. In turn, the reliability coefficients were within the accepta- 
bility range. In passing, it should be noted that a modified odd-even 

/ 

technique Tvas utilized; this is similar to that 'enployed in the Hec^ 
stut^. Unfortunately, this esti^tion technique tends to underes^mate 
the reliability but is the only available one for assessing adaptive 
testing sequences , given that they vary as to^ length and precise^ item 
equivalence. Simply, there is a. need for new reliability estimation 
procedures for the adi^tive testing situation. 
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Table 5 



Means, Reliability and Validity 
(Convention with Adaptive) Coefficient* For MSU Study 



Mean Percent 


Kfl rtt 


Validity 


Final Exam 
(N - 28) 

Paper and Pencil 




.84 

■* > 






Adaptive Test 


82 


.87 


. .91 




Module Tests 
(H - 33) 

Paper and Pencil 


92 


.77 < 


.87 




Adaptive Test 


90 ' 


.71 







Table 6 

Reliability and Validity For Affective Adaptive Teats 
(Tam Study) of Three Levels of Perceived teaching 



Reliability Validity C 



TH 


TA 


TL 


Tpool 


TH 


TA 


TL 


Tpool 


Flexilevel 


.89 


.97 


.95 


.97 


.91 


.99 


.96 


.98 ■ 


Branched 


.57 


.88 


.85 


.92 


.60 


.88 


.85 


.92 


Two- at age 
Flexiblock 


.90 . 

> 


.93 


.79 


.94 


.91 


.87 


.65 


.91 i. 



The total scores^-w&re-^hen- correlated to yield the vtlidity 
ooefficient. As can be, seen in Table 5, tliese are not only significant 
but quite substantial. Thus, one finds a fairly reasonable outcome for 
ad^>tive testing, that is, it tends to yield reliability and validity 
coefficients equivalent to that found for conventional testing. This 
stu^ is continuing and ultimately shall reveal validity measure 
relating to projects and instructor ratings. Additionally, the Air 
. Force Ihman I^source Laboratory is contracting to further replicate 
and extend these paradigms and findings. 

3.3 Adg>tive Testing Oh Affective Behaviors 
Ton (1973), \dule at Florida State Ihiversity, perfonned an 
assessment of the -reliability and validity for adaptive testing of an 
affective domain, namely, a Thurstone scale of students* attitudes 
toward teaching effectiveness. Utilizing a vdthin subject design, tarn 
, presented items vMch varied fron very negative to very positive • A 
student was allowed to move among the flexilevel adjustments according:' 
to tiie prescribed Lord algorithm, lie was terminated once he had agreed 
tliree consecutive times, be this at a positive or negative point in 
tlie scale. All entries were made/ at the midpoint in the scale as 
suggested by Lord* For tlie purposes of conparison, Tam compared three • 
independent groups^ one^ under j^exilevel algoritlim, a second under a 
branching algoritlmi and a thira mder a two stage f lexiblock algorithm. 
As can be seen in Table ^, tne flexilevel adaptive test yielded' 
substantially tlie best reliability and validity coefficients. The 
tliree groiq)s considered i^re those teachers who were rated high , average 
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and IcMf these being pooled over a nionber of teachers. As can be seen 
by tlie magnitude of tlie coefficients, one can judge affective adaptive 
testing to be a highly reliable and valid activity, 

Periiaps wore interestingly, Tarn assessed in a posteriori fashion 
the actual test length required, if a student had been appropriately 
placed at tlie positive or negative end of the continuum based on the 
prior known means for tlie teaclier. Ihder tliese conditions it was 
found tliat tlirec or less items Ixad to be i^sented in order to start 
tlie Tliurstonc matcli recjuired by tlie design, Sudi efficient 'identifi- 
cation of the affective state of the student is qui.te iirpreSsive, 

3/4 Summary of Dehavioral Studies 

IVhile limited in nunber' and scope, these studies indicate a trend, 
namely, adaptive testing yields equivalent or slightly superior 
reliabilities and validities given a significant reduction in test 
items. There is some indication that more stress, anxiety, and perhaps 
reality is found in adaptive testing. Wliile no one likes st^ss, it 
may \yo a precursor to inproved validity. Given the range of test 
content, tlie findings appear to liave robust generalizaoility, 

* 

4.0 System ;Pactors and' Adaptive Testing ' * ^ 

As presented in Figure 1, the adaptive testing process not only 
provides feedback to the passrfail decision process* for the student 
but ouglit to assist in the cybernetic gi^a^ of the system ;^a wonderful 
concept but yet to be realized, Tliis section shall review fthe time 
saving in adaptive testing and move on to the proposed paradigms for 



systems feedback. 

As is to be e3q)ected, tlierc is a significant saving imder flexi- 
level item presentation • The NBU groups indicated tliat only 311 of the 
itenis are utilized given individualized test entry. This yields a 1S3) 
saving in testing time. The Tam Study indicated that 6,3 items as 
opposed to 16 items yield reliable and valid results, Tlie ItesU study 
.did ijot l\ave an adaptive entry or termination but post 1k>c analysis / J 
indicates a 101 time savings . Given that most individualized AIM 
systems commit xxp to 20^ .of student timo to testings these SOt or ' ; 
greater values are highly significant in a systems efficiency sense. 
Further replications- over systems and test types arp obviously required. 

4.1 Systems Cybernetic Effects 
^ Surprisingly, tiiere are few suggestive oonject(ires relating to 
systems feedback, VAiile mentioned frequently since Stolurow's use of 
the concept y the operationalization of feedback tends to be a null set 
for AIM. Let us consider some concrete possibilities. 

First, the flow logistics and management of the system is para- 
mount. Overload is tlie most frequent cause of AIM failure. Test 
pass- fail rates and time consunptions are Mghly idealized indices of 
the system. These can be used to monitor the system and seek quasi- 
optimal states, ^bst iinportantly, modeling of these may be a first 
step to optimizing tiie system in a rigorous quantitative iqanner. 

Concurrently, tlie opportunity to reduce testing time sliould 
provide time for the class -state measurement. Class is a concept thiat 
relates via cluster analysis common student characteristics to optimal 



instructional treatments . To know liow many group treatment relation- 
ships are necessary represents the rank of the system. In tum^ indi- 
vidual state to state variations is at tl)e heart of AI^f • Iherefqtre , . 
these class -state indices are tlie core for forming profiles and 
prescriptive algorithms. k - 

Finally , ackqstive testing allo^ for system ad^tation in that 
sMfts in criterion levels by manpower loads or test-reiaediation 
siibprocesses by pipeline flow are at tiie heart of system readjustments 
Tlie 'cx>nce,>t of readjustments are liardly new but rarely^ approached in 
a dynamic process manner. 
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